Beyond ASR 1-best: Using word confusion networks in spoken language understanding

نویسندگان

  • Dilek Z. Hakkani-Tür
  • Frédéric Béchet
  • Giuseppe Riccardi
  • Gökhan Tür
چکیده

We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustness to ASR errors. State of the art spoken language understanding relies on the best ASR hypotheses (ASR 1-best). In this paper, we propose methods for a tighter integration of ASR and SLU using word confusion networks (WCNs). WCNs obtained from ASR word graphs (lattices) provide a compact representation of multiple aligned ASR hypotheses along with word confidence scores, without compromising recognition accuracy. We present our work on exploiting WCNs instead of simply using ASR one-best hypotheses. In this work, we focus on the tasks of named entity detection and extraction and call classification in a spoken dialog system, although the idea is more general and applicable to other spoken language processing tasks. For named entity detection, we have improved the F-measure by using both word lattices and WCNs, 6–10% absolute. The processing of WCNs was 25 times faster than lattices, which is very important for real-life applications. For call classification, we have shown between 5% and 10% relative reduction in error rate using WCNs compared to ASR 1-best output. 2005 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Spoken Languag Using Word Confusion

A natural language spoken dialog system includes a large vocabulary automatic speech recognition (ASR) engine, whose output is used as the input of a spoken language understanding component. Two challenges in such a framework are that the ASR component is far from being perfect and the users can say the same thing in very different ways. So, it is very important to be tolerant to recognition er...

متن کامل

Using word confusion networks for slot filling in spoken language understanding

Semantic slot filling is one of the most challenging problems in spoken language understanding (SLU) because of automatic speech recognition (ASR) errors. To improve the performance of slot filling, a successful approach is to use a statistical model that is trained on ASR one-best hypotheses. The state of the art models for slot filling rely on using discriminative sequence modeling methods, s...

متن کامل

Semantic parsing using word confusion networks with conditional random fields

A challenge in large vocabulary spoken language understanding (SLU) is robustness to automatic speech recognition (ASR) errors. The state of the art approaches for semantic parsing rely on using discriminative sequence classification methods, such as conditional random fields (CRFs). Most dialog systems employ a cascaded approach where the best hypotheses from the ASR system are fed into the fo...

متن کامل

Conditional use of word lattices, confusion networks and 1-best string hypotheses in a sequential interpretation strategy

Within the context of a deployed spoken dialog service, this study presents a new interpretation strategy based on the sequential use of different ASR output representations: 1-best strings, word lattices and confusion networks. The goal is to reject as early as possible in the decoding process the nonrelevant messages containing non-speech or out-of-domain content. This is done through the 1-p...

متن کامل

Spoken language translation systems ************ ASR word lattice translation with exhaustive reordering is possible

This paper shows how ASR word lattices can be translated even when exhaustive reordering is required for good translation quality. We propose a method for labeling lattice word hypotheses with position information derived from a confusion network (CN). This information is effectively used in the statistical phrase-based machine translation (MT) search to reduce its complexity, which makes even ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Speech & Language

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2006